79 research outputs found
On the Locality of Action Domination in Sequential Decision Making
In the field of sequential decision making and reinforcement learning, it has been observed that good policies for most problems exhibit a significant amount of structure. In practice, this implies that when a learning agent discovers an action is better than any other in a given state, this action actually happens to also dominate in a certain neighbourhood around that state. This paper presents new results proving that this notion of locality in action domination can be linked to the smoothness of the environment's underlying stochastic model. Namely, we link the Lipschitz continuity of a Markov Decision Process to the Lispchitz continuity of its policies' value functions and introduce the key concept of influence radius to describe the neighbourhood of states where the dominating action is guaranteed to be constant. These ideas are directly exploited into the proposed Localized Policy Iteration (LPI) algorithm, which is an active learning version of Rollout-based Policy Iteration. Preliminary results on the Inverted Pendulum domain demonstrate the viability and the potential of the proposed approach
Detecting Olives with Synthetic or Real Data? Olive the Above
Modern robotics has enabled the advancement in yield estimation for precision
agriculture. However, when applied to the olive industry, the high variation of
olive colors and their similarity to the background leaf canopy presents a
challenge. Labeling several thousands of very dense olive grove images for
segmentation is a labor-intensive task. This paper presents a novel approach to
detecting olives without the need to manually label data. In this work, we
present the world's first olive detection dataset comprised of synthetic and
real olive tree images. This is accomplished by generating an auto-labeled
photorealistic 3D model of an olive tree. Its geometry is then simplified for
lightweight rendering purposes. In addition, experiments are conducted with a
mix of synthetically generated and real images, yielding an improvement of up
to 66% compared to when only using a small sample of real data. When access to
real, human-labeled data is limited, a combination of mostly synthetic data and
a small amount of real data can enhance olive detection
Rollout Sampling Approximate Policy Iteration
Several researchers have recently investigated the connection between
reinforcement learning and classification. We are motivated by proposals of
approximate policy iteration schemes without value functions which focus on
policy representation using classifiers and address policy learning as a
supervised learning problem. This paper proposes variants of an improved policy
iteration scheme which addresses the core sampling problem in evaluating a
policy through simulation as a multi-armed bandit machine. The resulting
algorithm offers comparable performance to the previous algorithm achieved,
however, with significantly less computational effort. An order of magnitude
improvement is demonstrated experimentally in two standard reinforcement
learning domains: inverted pendulum and mountain-car.Comment: 18 pages, 2 figures, to appear in Machine Learning 72(3). Presented
at EWRL08, to be presented at ECML 200
Adaptation-Based Programming in Haskell
We present an embedded DSL to support adaptation-based programming (ABP) in
Haskell. ABP is an abstract model for defining adaptive values, called
adaptives, which adapt in response to some associated feedback. We show how our
design choices in Haskell motivate higher-level combinators and constructs and
help us derive more complicated compositional adaptives.
We also show an important specialization of ABP is in support of
reinforcement learning constructs, which optimize adaptive values based on a
programmer-specified objective function. This permits ABP users to easily
define adaptive values that express uncertainty anywhere in their programs.
Over repeated executions, these adaptive values adjust to more efficient ones
and enable the user's programs to self optimize.
The design of our DSL depends significantly on the use of type classes. We
will illustrate, along with presenting our DSL, how the use of type classes can
support the gradual evolution of DSLs.Comment: In Proceedings DSL 2011, arXiv:1109.032
Approximate Policy Iteration using Large-Margin Classifiers
We present an approximate policy iteration algorithm that uses rollouts to estimate the value of each action under a given policy in a subset of states and a classifier to generalize and learn the improved policy over the entire state space. Using a multiclass support vector machine as the classifier, we obtained successful results on the inverted pendulum and the bicycle balancing and riding domains
Algorithm Selection using Reinforcement Learning
Many computational problems can be solved by multiple algorithms, with different algorithms fastest for different problem sizes, input distributions, and hardware characteristics. We consider the problem of algorithm selection: dynamically choose an algorithm to attack an instance of a problem with the goal of minimizing the overall execution time. We formulate the problem as a kind of Markov decision process (MDP), and use ideas from reinforcement learning to solve it. This paper introduces a kind of MDP that models the algorithm selection problem by allowing multiple state transitions. The well known Q-learning algorithm is adapted for this case in a way that combines both Monte-Carlo and Temporal Difference methods. Also, this work uses, and extends in a way to control problems, the Least-Squares Temporal Difference algorithm (LSTD ) of Boyan. The experimental study focuses on the classic problems of order statistic selection and sorting. The encouraging results reveal the potential of applying learning methods to traditional computational problems
- …